4.1 - Goal - Learning an Economic Trade-off
Having successfully trained an agent on a single task, we now introduce a core element of strategy games: decision-making under constraints. Our second agent will learn to manage the fundamental economic trade-off between building more workers (increasing income) and building more supply (increasing capacity).
This task is designed to teach the agent to prioritize actions based on a dynamic game state, moving beyond a single, repetitive goal.
The Conceptual Challenge- Balancing Competing Goods
The agent must learn that both actions are "good" but that one is often more critical than the other depending on the situation.
+------------------------+ +-----------------------------+
| Action 1: Build SCV | | Action 2: Build Depot |
+------------------------+ +-----------------------------+
| PROS: | | PROS: |
| + Increases Income | | + Unlocks Army Growth |
| CONS: | | CONS: |
| - Consumes Supply | | - Does not increase Income |
| | | |
+-----------^------------+ +-------------^---------------+
| |
| WHICH ONE IS BETTER |
| RIGHT NOW? |
+--------------------------------+
|
+-------------v---------------+
| The Agent's Policy |
+-----------------------------+
Success Criteria
- The agent must learn to continuously produce workers.
- The agent must learn to proactively build Supply Depots to avoid being supply-blocked.
- The agent must learn to prioritize depot construction when
supply_left
is low, even if it has enough minerals to build a worker. - The episode must terminate successfully upon reaching a target of 30 workers.
Environment Design
1. Observation Space Specification
To make an informed decision, the agent now needs full visibility into the supply situation.
- Gymnasium Type:
gymnasium.spaces.Box
- Shape:
(4,)
- Data Type:
numpy.float32
Index | Source Feature | Normalization | Purpose |
---|---|---|---|
0 | self.minerals | 1000.0 | "Can I afford my chosen action?" |
1 | self.workers.amount | 50.0 | "How is my economic progress?" |
2 | self.supply_used | 200.0 | "How much pressure is on my supply?" |
3 | self.supply_cap | 200.0 | "What is my current supply limit?" |
2. Action Space Specification
The action space is expanded to allow for the new building choice.
- Gymnasium Type:
gymnasium.spaces.Discrete
- Size:
3
Action Value | Agent's Intent |
---|---|
0 | Do Nothing |
1 | Build Worker (SCV) |
2 | Build Supply (Supply Depot) |
3. Reward Function Specification
The reward function is designed to heavily punish the primary failure state (getting supply-blocked) while lightly encouraging all productive actions.
Pseudocode Logic:
function get_reward(action, game_state, last_supply_left):
reward = 0
# Event-Based Penalty for a critical failure
if game_state.supply_left == 0 and last_supply_left > 0:
reward -= 10
# Sparse Rewards for productive actions
if action == 1 and successfully_trained_worker:
reward += 1
elif action == 2 and successfully_built_depot:
reward += 2 // Slightly higher to incentivize this crucial task
return reward
This design forces the agent to develop a more sophisticated policy. It cannot simply learn to always build workers; it must learn to pay attention to its supply and act preemptively to avoid the large penalty.